KiaDev Intelligence

#cross-modal alignment29/04/2025

UniME: Advancing Multimodal Representations with a Two-Stage MLLM Framework

UniME introduces a two-stage framework that significantly improves multimodal representation learning by leveraging textual knowledge distillation and hard negative instruction tuning, outperforming existing models on multiple benchmarks.

READ →